GEOG 5680 6680
Introduction to R
12 Spatial data in R

Author

Simon Brewer

Published

May 2, 2025

Introduction

In this lab, we will explore some of R’s functionality with spatial data, with special attention given to the sf package. For more information about sf, you can visit their website. Robin Lovelace’s book Geocomputation with R (available online) is also a really helpful and educational source for learning sf. For some of the longer examples, it is highly recommended to use R’s scripting functions to run the examples to save on re-typing.

Before starting, remember to create a working directory (e.g. module12). Next download the files for today’s lab from the Canvas page and move them to this directory. You’ll need the following files:

  • Climate dataset for Western North America: WNAclimate.csv
  • A shapefile of UTA light rail routes: LightRail_UTA.zip
  • A shapefile of UTA light rail stations: LightRailStations_UTA.zip
  • A shapefile of Utah counties: utahcounty.zip
  • A LandSat8 scene for part of California: rs.zip
  • A shapefile of California places: ca_places.zip

Note for the shapefiles, you will need to download and decompress the zip file for each one to get the shapefile and the associated extra files. If you are not sure about this, please let me know.

Base R has no structure for spatial data, so you will need to install the following packages (you should have some of these from previous modules):

  • sf
  • terra
  • RColorBrewer
  • ggplot2
  • viridis
library(ggplot2)
library(terra)
library(RColorBrewer)
library(sf)
library(viridis)

Intro to sf

What is sf?

sf is an R package designed to work with spatial data organized as “simple features”, an ISO standard for spatial data. The sf package is able to provide all the functionality it does because it interfaces with three widely adopted programming standards: PROJ, GDAL, and GEOS. These provide for coordinate reference systems, reading and writing of spatial data, and geometric operations, respectively, but more on this in a moment.

Note that all sf functions are prefixed with st_ (a legacy of this R package’s origins in PostGIS, where ‘st’ means “spatial type”). If this is not already installed on your computer, you’ll need to install it before going any further:

install.packages("sf")

What is a simple feature?

A simple feature is, in the words of the sf authors, “a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects” (ref). In other words, its structured data that provides information about a location in space, including its shape.

One great advantage of sf is that the data is stored in R’s memory as an extended dataframe. This means that all the functions that you can apply to a dataframe should work with an sf object. To demonstarte this, we’ll load a file of county outlines for North Carolina (this file is included when you install the sf package). First load the sf library:

library(sf)

Now load the shapefile using st_read(). This is a generic function (more on this below) for loading spatial data, and will work with most vector formats.

path_to_data <- system.file("shape/nc.shp", package="sf")

north_carolina <- st_read(path_to_data, quiet = TRUE)

north_carolina <- north_carolina[ , c("CNTY_ID", "NAME", "AREA", "PERIMETER")]

north_carolina
Simple feature collection with 100 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS:  NAD27
First 10 features:
   CNTY_ID        NAME  AREA PERIMETER                       geometry
1     1825        Ashe 0.114     1.442 MULTIPOLYGON (((-81.47276 3...
2     1827   Alleghany 0.061     1.231 MULTIPOLYGON (((-81.23989 3...
3     1828       Surry 0.143     1.630 MULTIPOLYGON (((-80.45634 3...
4     1831   Currituck 0.070     2.968 MULTIPOLYGON (((-76.00897 3...
5     1832 Northampton 0.153     2.206 MULTIPOLYGON (((-77.21767 3...
6     1833    Hertford 0.097     1.670 MULTIPOLYGON (((-76.74506 3...
7     1834      Camden 0.062     1.547 MULTIPOLYGON (((-76.00897 3...
8     1835       Gates 0.091     1.284 MULTIPOLYGON (((-76.56251 3...
9     1836      Warren 0.118     1.421 MULTIPOLYGON (((-78.30876 3...
10    1837      Stokes 0.124     1.428 MULTIPOLYGON (((-80.02567 3...


Note that there are a few lines of metadata, and then each row represents one spatial feature, along with a set of associated information. You can summarize this somewhat verbose printout by noting that simple features fit a simple formula:


\[ sf = attributes + geometry + crs \]

This formula also suggests the kinds of ways that you might interact with an sf object by, for example, changing its crs, or filtering based on its attributes (or geometry), or manipulating its geometry.


Attributes

Attributes are properties of a feature. In this case, the features are counties in North Carolina, and their attributes are things like name and area. In an sf data.frame, each feature is a row, and each attribute is a column. In the north_carolina object, for example, the first feature has the name “Ashe” and its county ID is 1825.


Geometry

A very special attribute column is called the geometry (sometimes labeled ‘geom’ or ‘shape’), shown above in the last column. It consists of a point or set of points (specifically, their coordinates) that define the shape and location of the feature. The simple feature standard includes 17 geometry types, 7 of which are supported by sf: point, multipoint, linestring, multilinestring, polygon, multipolygon, and geometry collection.



Figure 2.2 in Geocomputation with R


As mentioned already, these geometries are just a series of points:

point_one <- st_point(c(0, 3))

point_two <- st_point(c(5, 7))

a_line <- st_linestring(c(point_one, point_two))

If you print these geometries

point_one
POINT (0 3)
a_line
LINESTRING (0 3, 5 7)


you see that they are represented as a text string. This is the Well Known Text (WKT) standard for specifying geometries. It tells us what kind of geometry the feature is and lists its x-y coordinates separated by commas.

If you want to know what geometry type your simple feature contains, try:

st_geometry_type(a_line)
[1] LINESTRING
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

To show the type of the first feature in the North Carolina shapefile:

st_geometry_type(north_carolina[1, ])
[1] MULTIPOLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE


CRS

The final ingredient in a simple feature is its a spatial or coordinate reference system (CRS). A CRS provides two crucial pieces of information: (i) what rule we use to assign coordinates to points and (ii) what datum to use. It is not an exaggeration to say that the CRS is the most important element of a simple feature, for without a CRS, the numbers in the geometry column are just that, numbers, rather than full-blooded spatial coordinates.

Understanding what a coordinate assignment rule does is beyond the scope of this lab, but the datum deserves some attention. In effect, it specifies three things:

  1. the origin or the point on the earth’s surface where the coordinates are POINT (0 0),
  2. the scale of the coordinates, for example, whether we should think of POINT (5 7) as being 5 meters east and seven meters north of the origin, or - worse - 5 feet east and 7 feet north, and
  3. the orientation of the system, or which way is up?

As with the geometries, the standard for representing CRS is WKT, though the easiest way to identify a CRS is to use its EPSG code. To find the EPSG code for a CRS, you can visit this website: spatialreference.org.

The most widely used CRS is the World Geodetic System 84 (WGS 84, a geographic system) whose EPSG code is 4326:

st_crs(4326)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        MEMBER["World Geodetic System 1984 (G2296)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]


If you are familiar with the PROJ4-string syntax, you can retrieve that from a CRS with:

st_crs(4326)$proj4string
[1] "+proj=longlat +datum=WGS84 +no_defs"

However, current open standards specified by PROJ and GDAL discourage the use of PROJ4-string syntax in favor of WKT, so it is probably best to get use to the latter now.


Bounding Box

There’s actually one more element to a simple feature, but it is not as vital as the others and is really already implicit in the geometry. That is the bounding box. This is an object defined by the spatial extent of the data: the minimum and maximum x and y coordinates. You can retrieve the bounding box of a simple feature this way:

st_bbox(north_carolina)
     xmin      ymin      xmax      ymax 
-84.32385  33.88199 -75.45698  36.58965 

There are myriad uses for the bounding box, though we need not dwell on them here.

Read and Write

Reading and writing spatial data, it turns out, is quite the chore. The solution sf relies on is to interface with GDAL, which handles lots of different spatial data types (it’s kinda its whole purpose). Currently supported (vector) spatial data types can be found at GDAL.org. Perhaps the most common spatial data type - because ESRI is a thing - is the shapefile, which has a .shp file extension.


Reading in spatial data

In sf, the function for reading in spatial data is st_read. Here is the nitty-gritty and, perhaps, needlessly verbose version first:

#./data/utahcounty/utahcounty.shp

utah <- st_read(dsn = "./data/utahcounty/utahcounty.shp",
                layer = "utahcounty",
                drivers = "ESRI Shapefile")
options:        ESRI Shapefile 
Reading layer `utahcounty' from data source 
  `/Users/u0784726/Dropbox/Data/devtools/geog5680/data/utahcounty/utahcounty.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 29 features and 13 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -114.0529 ymin: 36.99766 xmax: -109.0416 ymax: 42.00171
CRS:           NA

dsn stands for “data source name” and specifies where the data is coming from, whether a file directory, a database, or something else. layer is the layer in the data source to be read in. Finally, drivers tells GDAL what format the file is in or what structure it has, so it knows how to correctly interpret the file. All of this information is printed to the console when you execute st_read.

In this case, we are using a simple ESRI shapefile, so the data source and layer are basically the same thing. Furthermore, sf is good at guessing the driver based on the file extension, so the driver does not normally need to be specified. Hence, we could just as well have written:

utah <- st_read("./data/utahcounty/utahcounty.shp", quiet = TRUE)

And here’s what this looks like. The combination of st_geometry and plot extracts only the spatial geometry (the polygons) and plots those. We’ll look more closely at plotting these and making maps later.

plot(st_geometry(utah))


Converting non-spatial data to simple features

Sometimes you have spatial data, but it is not in a spatial data format. Usually, this means you have a table or spreadsheet with columns for the x and y coordinates.

wna_climate <- read.csv("./data/WNAclimate.csv")

head(wna_climate)
  WNASEQ     LONDD   LATDD ELEV totsnopt annp Jan_Tmp Jul_Tmp
1      1 -107.9333 50.8736  686  78.7869  326   -13.9    18.8
2      2 -105.2670 55.2670  369 145.3526  499   -21.3    16.8
3      3 -102.5086 41.7214 1163  42.6544  450    -4.2    23.3
4      4 -110.2606 44.2986 2362 255.1009  489   -10.9    14.1
5      5 -114.1500 59.2500  880 164.8924  412   -23.9    14.4
6      6 -120.6667 57.4500  900 141.9260  451   -17.5    13.8


This can be converted to a simple feature using the st_as_sf function like so:

wna_climate <- st_as_sf(wna_climate, 
                        coords = c("LONDD", "LATDD"),
                        crs = 4326)

wna_climate
Simple feature collection with 2012 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -138 ymin: 29.8333 xmax: -100 ymax: 60
Geodetic CRS:  WGS 84
First 10 features:
   WNASEQ ELEV totsnopt annp Jan_Tmp Jul_Tmp                  geometry
1       1  686  78.7869  326   -13.9    18.8 POINT (-107.9333 50.8736)
2       2  369 145.3526  499   -21.3    16.8   POINT (-105.267 55.267)
3       3 1163  42.6544  450    -4.2    23.3 POINT (-102.5086 41.7214)
4       4 2362 255.1009  489   -10.9    14.1 POINT (-110.2606 44.2986)
5       5  880 164.8924  412   -23.9    14.4     POINT (-114.15 59.25)
6       6  900 141.9260  451   -17.5    13.8   POINT (-120.6667 57.45)
7       7 1100 183.4187  492   -18.8    13.2 POINT (-119.7167 56.7167)
8       8 1480 191.5657  606   -11.0    12.8    POINT (-114.6 50.7667)
9       9  651 144.9015  443   -22.4    15.3 POINT (-122.1667 59.5167)
10     10  725 151.0778  455   -23.5    15.4    POINT (-112.1 57.7667)

The function just needs to know what columns the x and y coordinates are in and what CRS they are specified in. And here’s what it looks like:

plot(st_geometry(wna_climate), pch = 19, col = alpha("darkgreen", 0.5))


Writing spatial data

The sf function for writing simple features to disk is st_write. It is almost an exact mirror of st_read, but it also requires that you specify the simple feature object in your R environment that you want to write to disk. If the layer already exists, you will need to specify delete_layer = TRUE.

st_write(obj = wna_climate,
         dsn = "./data/wnaclim.shp",
         layer = "wnaclim",
         drivers = "ESRI Shapefile")


or, more simply:

st_write(wna_climate, dsn = "./data/wnaclim.shp")

CRS operations

The cardinal rule for working with any spatial data is to make sure all of it is in the same CRS. This ensures that any analysis which combines multiple sources is correctly comparing values at the same locations. Never ever ever ever do anything with your data until you are sure you’ve got the CRS right.

The st_crs() function allows you to quickly check the CRS for any object.

st_crs(utah)
Coordinate Reference System: NA

Which shows as missing (NA), as there was no prj file with this shapefile. We can set this using an EPSG code. The shapefile is in WGS84, and the EPSG code for this is 4326. There are two methods to set the CRS for a spatial object: st_crs<- and st_set_crs.

utah <- st_set_crs(utah, 4326)

st_crs(utah) <- 4326

If we now recheck the CRS, you’ll see that it is now fully informed in WKT format:

st_crs(utah)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        MEMBER["World Geodetic System 1984 (G2296)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Note: this should only be used when the simple feature is missing a CRS and you know what it is. It is NOT for re-projecting the sf object to a new coordinate system.


Reprojecting CRS

The st_transform() function allows you to project your sf object to a new CRS. This is particularly useful if you have multiple data sources with different original coordinate systems. Here, we’ll reproject the Utah country data to UTM Zone 12N (EPSG code 32612):

utah <- st_transform(utah, crs = 32612)

st_crs(utah)
Coordinate Reference System:
  User input: EPSG:32612 
  wkt:
PROJCRS["WGS 84 / UTM zone 12N",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            MEMBER["World Geodetic System 1984 (G2296)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 12N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-111,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Navigation and medium accuracy spatial referencing."],
        AREA["Between 114°W and 108°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - Alberta; Northwest Territories (NWT); Nunavut; Saskatchewan. Mexico. United States (USA)."],
        BBOX[0,-114,84,-108]],
    ID["EPSG",32612]]

As a reminder: when you read in spatial data, the first thing you should use is st_crs to check the CRS and st_transform to re-project if necessary.


You can also check the EPSG code (if specified):

st_crs(utah)$epsg
[1] 32612
st_crs(wna_climate)$epsg
[1] 4326


And you can get the name of a CRS this way:

format(st_crs(utah))
[1] "WGS 84 / UTM zone 12N"


Attribute operations

The attribute part of a sf object is a data.frame, so you can use all the methods we have previously looked at for data manipulation in working with attributes.

class(utah)
[1] "sf"         "data.frame"

If you enter the name of an sf object, it will print the first few rows of the attribute table:

utah
Simple feature collection with 29 features and 13 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 228583.4 ymin: 4094744 xmax: 673944.6 ymax: 4653572
Projected CRS: WGS 84 / UTM zone 12N
First 10 features:
   SP_ID COUNTYNBR  ENTITYNBR ENTITYYR       NAME FIPS STATEPLANE POP_LASTCE
1      0        13 2010131010     2010       KANE   25      South       7125
2      1        14 2010141010     2010    MILLARD   27    Central      12503
3      2        20 2010201010     2010    SANPETE   39    Central      27822
4      3        04 2010041010     2010     CARBON    7    Central      21403
5      4        25 2010251010     2010       UTAH   49    Central     516564
6      5        03 2010031010     2010      CACHE    5      North     112656
7      6        22 2010221010     2010     SUMMIT   43      North      36324
8      7        27 2010271010     2010 WASHINGTON   53      South     138115
9      8        10 2010101010     2010      GRAND   19    Central       9225
10     9        24 2010241010     2010     UINTAH   47    Central      32588
   POP_CURRES FIPS_STR COLOR4 Shape_Leng  Shape_Area
1        7125    49025      4   559368.8 10631563850
2       12503    49027      3   576258.3 17708674898
3       27822    49039      2   317646.9  4146734071
4       21403    49007      1   333190.1  3844121594
5      516564    49049      3   481098.2  5544905821
6      112656    49005      2   290082.5  3035354927
7       36324    49043      1   477597.9  4870022623
8      138115    49053      1   347253.9  6297924699
9        9225    49019      1   516092.9  9539294359
10      32588    49047      2   540619.1 11661945600
                         geometry
1  POLYGON ((425313.9 4154648,...
2  POLYGON ((393592.1 4378949,...
3  POLYGON ((448347.6 4407164,...
4  POLYGON ((495617.5 4407122,...
5  POLYGON ((449722.4 4491979,...
6  POLYGON ((414165.9 4623363,...
7  POLYGON ((447597 4510365, 4...
8  POLYGON ((281567.5 4165110,...
9  POLYGON ((667230.1 4373926,...
10 POLYGON ((651528.9 4524595,...


Select Columns

As this is a dataframe at heart, we can use the same functions we looked at in the previous lab to manipulate the data:

# method 1 (base R)
utah2 <- utah[ , c("NAME", "FIPS", "POP_CURRES")]

# method 2 (dplyr)
library(dplyr)
utah2 <- utah %>%
  select(NAME, FIPS, POP_CURRES)

names(utah)
 [1] "SP_ID"      "COUNTYNBR"  "ENTITYNBR"  "ENTITYYR"   "NAME"      
 [6] "FIPS"       "STATEPLANE" "POP_LASTCE" "POP_CURRES" "FIPS_STR"  
[11] "COLOR4"     "Shape_Leng" "Shape_Area" "geometry"  
names(utah2)
[1] "NAME"       "FIPS"       "POP_CURRES" "geometry"  

Notice this very important difference between regular data.frames and sf data.frames: when you subset by columns, even though you do not explicitly state that you want to keep the geometry column, it keeps that column anyway. In this sense, the geometry column is said to be “sticky”.


Filter Rows

Subsetting the data by rows works in the same way as before. So we can carry out conditional selection of locations by using the usual comparison operators (<, <=, ==, !=, >=, >). For example, to select only the counties with over 50,000 people:

# method 1
utah3 <- utah[utah$POP_CURRES > 50000, ]

# method 2
utah3 <- utah2 %>%
  filter(POP_CURRES > 50000)
plot(st_geometry(utah), col = alpha("gray", 0.2), pch = 19)

plot(st_geometry(utah3), col = "darkred", pch = 19, add = TRUE)


Add Column

New variables can easily be appended to an existing sf object using the following notation:

utah$area_km2 <- utah$Shape_Area / 1e6

# method 2
utah <- utah %>%
  mutate(area_km2 = Shape_Area / 1e6)

names(utah)
 [1] "SP_ID"      "COUNTYNBR"  "ENTITYNBR"  "ENTITYYR"   "NAME"      
 [6] "FIPS"       "STATEPLANE" "POP_LASTCE" "POP_CURRES" "FIPS_STR"  
[11] "COLOR4"     "Shape_Leng" "Shape_Area" "geometry"   "area_km2"  


Extract Column

If you need to extract any variable from a sf object to a standard R vector, you can again use the standard notation. Note that if you use select to specify the columns, you need to also add st_drop_geometry to remove the geometry:

# method 1
area_km2 <- utah$area_km2

# method 2
area_km2 <- utah %>%
  select(area_km2) %>%
  st_drop_geometry()

area_km2$area_km2[1:10]
 [1] 10631.564 17708.675  4146.734  3844.122  5544.906  3035.355  4870.023
 [8]  6297.925  9539.294 11661.946


Get Geometry

If you need only the geometry (the set of coordinates, or vector definitions), these can be extracted as follows:

geometry <- st_geometry(utah)

geometry
Geometry set for 29 features 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 228583.4 ymin: 4094744 xmax: 673944.6 ymax: 4653572
Projected CRS: WGS 84 / UTM zone 12N
First 5 geometries:
POLYGON ((425313.9 4154648, 425715.6 4154644, 4...
POLYGON ((393592.1 4378949, 393813.7 4378945, 3...
POLYGON ((448347.6 4407164, 448363.5 4407163, 4...
POLYGON ((495617.5 4407122, 496010.1 4407117, 4...
POLYGON ((449722.4 4491979, 449736.6 4491978, 4...


Drop Geometry

In case you just want the attributes, not the geometry:

attributes <- st_drop_geometry(utah)

head(attributes)
  SP_ID COUNTYNBR  ENTITYNBR ENTITYYR    NAME FIPS STATEPLANE POP_LASTCE
1     0        13 2010131010     2010    KANE   25      South       7125
2     1        14 2010141010     2010 MILLARD   27    Central      12503
3     2        20 2010201010     2010 SANPETE   39    Central      27822
4     3        04 2010041010     2010  CARBON    7    Central      21403
5     4        25 2010251010     2010    UTAH   49    Central     516564
6     5        03 2010031010     2010   CACHE    5      North     112656
  POP_CURRES FIPS_STR COLOR4 Shape_Leng  Shape_Area  area_km2
1       7125    49025      4   559368.8 10631563850 10631.564
2      12503    49027      3   576258.3 17708674898 17708.675
3      27822    49039      2   317646.9  4146734071  4146.734
4      21403    49007      1   333190.1  3844121594  3844.122
5     516564    49049      3   481098.2  5544905821  5544.906
6     112656    49005      2   290082.5  3035354927  3035.355

Note: this is actually a special sort of data.frame called a tibble. Not important to know about here, but does print slightly differently.

Spatial operations

Spatial operations are like attribute operations, but they work with the geometry column rather than the attributes. There are loads of these functions, but will just review some of the more important ones here.

Spatial Filter

This is probably the biggest one. Basically, you are taking one geometry and using it to filter other geometries. To demonstrate this, first we’ll make some random points in the utah simple feature, using st_sample to generate the random points:

set.seed(1234)

random_pnts <- st_sample(utah, size = 500)

random_pnts <- st_as_sf(random_pnts)
plot(st_geometry(utah))

plot(st_geometry(random_pnts), 
     col = alpha("red", 0.5), 
     pch = 19, 
     add = TRUE)

Now, we can use one geometry to filter out a second one. To obtain just the points in, say, Salt Lake County, we first subset the Utah sf object to extract only this county:

slcounty <- subset(utah, NAME == "SALT LAKE")

Then you can filter the points using this new object, either by using the st_filter() function, or using the country as an index in the [,] notation:

filtered_pnts <- st_filter(random_pnts, slcounty)

filtered_pnts <- random_pnts[slcounty, ]
plot(st_geometry(utah))

plot(st_geometry(filtered_pnts), 
     col = "red", 
     pch = 19, 
     add = TRUE)


Topological Relations

Internally, st_filter assumes a “topological” or spatial relationship defined by what the sf authors refer to as spatial predicate (.predicate). By default, st_intersects works to find the geometry of one object located within another. We can, however, specify other spatial relationships. For example, to get all the points outside Salt Lake:

filtered_pnts <- st_filter(random_pnts, slcounty, .predicate = st_disjoint)

Another useful predicate is st_is_within_distance, which requires that you pass an additional distance (dist) argument to the filter. The dist argument is in units specified by the CRS, in this case meters.

filtered_pnts <- st_filter(random_pnts, 
                           slcounty, 
                           .predicate = st_is_within_distance,
                           dist = 50000)


Geometric operations

With spatial operations, the geometry is preserved (mostly). With geometric operations, the whole point is to manipulate the geometry. Again, we are just going to hit the highlights. It is worth emphasizing that these operations will often behave differently depending on the geometry type.


Centroid

the_heart_of_slc <- st_centroid(slcounty)
Warning: st_centroid assumes attributes are constant over geometries
plot(st_geometry(slcounty))

plot(st_geometry(the_heart_of_slc), pch = 17, 
     col = "red", cex = 2, add = TRUE)


Buffer

the_heft_of_slc <- st_buffer(slcounty, dist = 50000)


Union

This one merges geometries and dissolves interior borders when applied to polygons.

utah_boundary <- st_union(utah)

plot(st_geometry(utah_boundary))


Cast

To cast a geometry is to change it from one geometry type to another. For example, to convert the boundary of Salt Lake County to points (the vertices of the polygon):

slc_points <- st_cast(slcounty, "POINT")
Warning in st_cast.sf(slcounty, "POINT"): repeating attributes for all
sub-geometries for which they may not be constant
plot(st_geometry(slc_points), col = "darkorange", pch = 19)

If we convert to a LINESTRING object, this acts to separate out the individual polygons:

utah_lines <- st_cast(utah_boundary, "MULTILINESTRING")

utah_lines <- st_cast(utah_lines, "LINESTRING")

Plotting

graphics

To make simple plots of an sf object, you can use R’s base function plot():

plot(utah2)


Notice that it creates separate plots for each attribute. If you would prefer to plot the geometry itself, as we did above.

plot(st_geometry(utah2))

ggplot2

One of the easiest ways to improve on these base plots is to use ggplot2. This contains a a special plotting geometry, geom_sf, designed to work with sf objects. (Note that geom_sf refers to a ggplot2 geometry, not a sf geometry. Confusing, right?)

We can now plot this by calling the ggplot() function and adding the sf object with geom_sf:

ggplot() + 
  geom_sf(data = slcounty) +
  theme_bw()

Note that even though the map is projected in UTM coordinates, the axes are set to latitude and longitude.

Multiple Geometries

Multiple layers can be added to a plot by adding additional geom_sf functions. Here, we read in two additional shapefiles: one containing the locations of light rail stations, and one containing the light rail tracks.

lightrail <- st_read("./data/LightRail_UTA/LightRail_UTA.shp", quiet = TRUE)

stations <- st_read("./data/LightRailStations_UTA/LightRailStations_UTA.shp",
                    quiet = TRUE)
ggplot() + 
  geom_sf(data = slcounty) +
  geom_sf(data = lightrail, col = 'blue') +
  geom_sf(data = stations, col = 'darkorange', alpha = 0.6) +
  
  theme_bw()

stadium = stations %>%
  filter(STATIONNAM == "Stadium")
stadium = cbind(stadium, st_coordinates(stadium))
ggplot() + 
  geom_sf(data = slcounty) +
  geom_sf(data = lightrail, col = 'blue') +
  geom_sf(data = stations, col = 'darkorange', alpha = 0.6) +
  geom_label(data = stadium, aes(X, Y, label = STATIONNAM), size = 2.5) +
  theme_bw()

Plotting attributes

We can create thematic maps by specifying the name of a variable in the geom_sf() function:

names(utah2)
[1] "NAME"       "FIPS"       "POP_CURRES" "geometry"  
ggplot() + 
  geom_sf(data = utah2, aes(fill = POP_CURRES)) +
  theme_bw()

my_breaks = c(0, 10000, 100000, 1000000)
ggplot() + 
  geom_sf(data = utah2, aes(fill = POP_CURRES)) +
  scale_fill_continuous(trans = "log",
                        breaks = my_breaks, labels = my_breaks) +
  theme_bw()


Manual Color Scale

Here, we will use the viridis color scale, which is colorblind safe. This comes with several color palette options.

ggplot() + 
  geom_sf(data = utah2, aes(fill = POP_CURRES)) +
  scale_fill_viridis(option = "viridis") +
  theme_bw()

ggplot() + 
  geom_sf(data = utah2, aes(fill = POP_CURRES)) +
  scale_fill_viridis(option = "magma", trans = "log",
                     breaks = my_breaks, labels = my_breaks) +
  theme_bw()

Raster data

Previously, we were working with vector spatial data. These are geometries composed of points defined by their coordinates. An alternative form of spatial data is known as a raster. This is gridded data. It takes the form of a rectangle composed of squares of equal size, which are sometimes called ‘cells’ or ‘pixels’. Each cell stores some kind of value.

This simplifies the geometry, which can be specified by two pieces of information: the spatial extent of the raster and the resolution of the cells. Here we create a blank raster with 10 rows and 10 columns, with a resolution of 10x10 using the rast() function. We then assign random values to each cell:

r <- rast(nrow = 10, ncol = 10, 
          xmin = 0, xmax = 100, 
          ymin = 0, ymax = 100)

r[] <- runif(n = 100)

r
class       : SpatRaster 
dimensions  : 10, 10, 1  (nrow, ncol, nlyr)
resolution  : 10, 10  (x, y)
extent      : 0, 100, 0, 100  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
name        :       lyr.1 
min value   : 0.008326433 
max value   : 0.970400785 

rast objects can be plotted using the base plot() command:

plot(r)

The terra package offers a wide array of functions for dealing with gridded data, including the ability to read from many widely used file formats, like remote sensing images (e.g. GeoTiffs), NetCDF, and HDF formats. We will use it here to work with a Landsat 8 scene collected on June 14, 2017. The subset covers the area between Concord and Stockton, in California, USA. The files are contained in the zip file rs.zip. Download this (if you haven’t already) and unzip it in your data folder.

We will also need the shapefile of California places (ca_places.zip), so download and unzip this as well. We’ll read this in before starting:

ca_places <- st_read("./data/ca_places/ca_places.shp", quiet = TRUE)


Read and Write Rasters

To read in gridded data, use the rast() function. This will read in the Near Infrared (NIR) channel (B5)

b5 <- rast("./data/rs/LC08_044034_20170614_B5.tif")

b5
class       : SpatRaster 
dimensions  : 1245, 1497, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : 594090, 639000, 4190190, 4227540  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 10N (EPSG:32610) 
source      : LC08_044034_20170614_B5.tif 
name        : LC08_044034_20170614_B5 
min value   :            0.0008457669 
max value   :            1.0124315023 

When we print the rast object created, the second line of the output lists the dimensions of the data. Note that here, this has 1245 rows, 1497 columns and 1 layer. This also shows the resolution (30x30 m), the extent, the CRS and the a brief summary of the data.

We can write rast objects back to file using writeRaster() (I’ll bet you never thought it would be called that). You can write out to any format supported by GDAL. Here we write out to a TIFF format. You can see the full list of available formats for reading and writing by running the writeFormats() function in the console. We’ll use this again after having worked with the data.

writeRaster(b5, 
            filename = "./b5.tif",
            overwrite = TRUE)


Raster CRS

As with the sf objects, we can check the coordinate reference system of the file we just read. This does not print very well, but if you look at the end you’ll see a reference to the EPSG code (32610).

crs(b5)
[1] "PROJCRS[\"WGS 84 / UTM zone 10N\",\n    BASEGEOGCRS[\"WGS 84\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4326]],\n    CONVERSION[\"UTM zone 10N\",\n        METHOD[\"Transverse Mercator\",\n            ID[\"EPSG\",9807]],\n        PARAMETER[\"Latitude of natural origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",-123,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"Scale factor at natural origin\",0.9996,\n            SCALEUNIT[\"unity\",1],\n            ID[\"EPSG\",8805]],\n        PARAMETER[\"False easting\",500000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"(E)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1]],\n        AXIS[\"(N)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1]],\n    USAGE[\n        SCOPE[\"Navigation and medium accuracy spatial referencing.\"],\n        AREA[\"Between 126°W and 120°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - British Columbia (BC); Northwest Territories (NWT); Nunavut; Yukon. United States (USA) - Alaska (AK).\"],\n        BBOX[0,-126,84,-120]],\n    ID[\"EPSG\",32610]]"

If the CRS is not set, you can set it using the crs() function and an EPSG code. For example, the following code would set the CRS to WGS84 (don’t run this as the CRS is already defined)

crs(b5) <- "EPSG:4326"

Again, this should not be used to change the CRS, only set it. Note that crs is for rast objects, st_crs for vectors.


You can transform the CRS for a raster layer using project(). This can again use a EPSG type code.

b5_wgs84 <- project(b5, "EPSG:4326")

crs(b5_wgs84)
[1] "GEOGCRS[\"WGS 84\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n            LENGTHUNIT[\"metre\",1]]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433]],\n    CS[ellipsoidal,2],\n        AXIS[\"geodetic latitude (Lat)\",north,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        AXIS[\"geodetic longitude (Lon)\",east,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n    USAGE[\n        SCOPE[\"Horizontal component of 3D system.\"],\n        AREA[\"World.\"],\n        BBOX[-90,-180,90,180]],\n    ID[\"EPSG\",4326]]"

We’ll keep the Landsat data in its UTM projection (10N), and reproject the CA place data to match:

ca_places <- st_transform(ca_places, 32610)


Basic Plotting

You can make a simple plot using the plot() function:

plot(b5, main = "Landsat 8 (B2)")

plot(st_geometry(ca_places), add = TRUE)


Summary Statistics

The function cellStats() can be used to calculate most summary statistics for a raster layer. So to get the mean global temperature (and standard deviation):

global(b5, mean)
                             mean
LC08_044034_20170614_B5 0.2693644
global(b5, sd)
                              sd
LC08_044034_20170614_B5 0.111683


Subset Rasters

If we want to use only a subset of the original raster layer, the function crop() will extract only the cells in a given region. This can be defined using another raster object or Spatial* object, or by defining an extent object:

# extent method
my_ext <- ext(c(xmin = 612500, 
                    xmax = 617500, 
                    ymin = 4196000,
                    ymax = 4201000))

b5_sub <- crop(b5, my_ext)

plot(b5_sub)

We can also use an sf object to crop the data. Here, we’ll extract the polygon for Bethel Island from the ca_places object, and use this to crop the raster:

bethel <- ca_places %>% 
  dplyr::filter(NAME == "Bethel Island")

b5_sub <- crop(b5, bethel)

plot(b5_sub)
plot(st_geometry(bethel), add = TRUE)

Note that crop subsets the original raster to the extent of Canada’s borders, rather than to the borders themselves. This is because rasters are always rectangular. You can ‘hide’ the values of raster cells outside of a polygon by using the mask function. The raster has to be rectangular, so this does not remove the cells outside the polygon. Rather, it sets their value to NA.

b5_sub <- mask(b5_sub, mask = bethel)

plot(b5_sub)
plot(st_geometry(bethel), add = TRUE)

Extract Data

Values can be extracted from individual locations (or sets of locations) using extract(). This can take a set of coordinates in matrix form, or use a Spatial* object. To get the reflectance value at 615000 E, 4199000 N:

extract(b5, cbind(615000,4199000))
  LC08_044034_20170614_B5
1               0.2121801

You can also extract for multiple locations. Let’s generate a set of random points in Bethel Island, and then sample the reflectance value for each of these.

random_pnts <- st_sample(bethel, size = 20)

extract(b5, st_coordinates(random_pnts))
   LC08_044034_20170614_B5
1                0.4229290
2                0.4595357
3                0.3613828
4                0.3997461
5                0.3108318
6                0.2716444
7                0.3579781
8                0.3546601
9                0.4567164
10               0.3814862
11               0.3680406
12               0.3428626
13               0.4722222
14               0.4904172
15               0.4502973
16               0.2493290
17               0.3117643
18               0.3804235
19               0.4021750
20               0.4093966

You can also extract values within a polygon, by replacing the point coordinates with a sf polygon:

b5_bethel <- extract(b5, bethel)

head(b5_bethel)
  ID LC08_044034_20170614_B5
1  1              0.10251180
2  1              0.08084705
3  1              0.05189565
4  1              0.17219034
5  1              0.29502234
6  1              0.31579795

By default, this returns the value of all pixels within the polygon. By adding the fun= argument, you can easily calculate summary statistics:

extract(b5, bethel, fun = 'median')
  ID LC08_044034_20170614_B5
1  1               0.3779947

Note that if the sf object has multiple polygons, it will return the summary statistic for each one. Let’s now extract the values for a different place (Oakley), and then we can compare them. We do this in a couple of steps. First get the polygon for Oakley, then extract using the two polygons combined.

oakley <- ca_places %>% 
  dplyr::filter(NAME == "Oakley")

b5_bethel_oakley <- extract(b5, rbind(bethel, oakley))

names(b5_bethel_oakley) <- c("ID", "B5")

We’ll visualize the difference with ggplot2, which shows higher NIR reflectance values for Bethel, likely indicating higher vegetation cover.

ggplot(b5_bethel_oakley, aes(x = B5, fill = as.factor(ID))) +
  geom_histogram(alpha = 0.7, position = 'identity')
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


Raster Stacks

The b2 raster represents a single band from the Landsat 8 scene. More usefully, we can load several bands and combine them into a single object. Here, we’ll load the blue (B2), green (B3), red (B4), and infrared (B5) bands :

# Blue
b2 <- rast('./data/rs/LC08_044034_20170614_B2.tif')
# Green
b3 <- rast('./data/rs/LC08_044034_20170614_B3.tif')
# Red
b4 <- rast('./data/rs/LC08_044034_20170614_B4.tif')
# Near Infrared (NIR)
b5 <- rast('./data/rs/LC08_044034_20170614_B5.tif')

Now we’ll create a raster stack with all 4 of these:

s <- c(b5, b4, b3, b2)

s
class       : SpatRaster 
dimensions  : 1245, 1497, 4  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : 594090, 639000, 4190190, 4227540  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 10N (EPSG:32610) 
sources     : LC08_044034_20170614_B5.tif  
              LC08_044034_20170614_B4.tif  
              LC08_044034_20170614_B3.tif  
              LC08_044034_20170614_B2.tif  
names       : LC08_04~0614_B5, LC08_04~0614_B4, LC08_04~0614_B3, LC08_04~0614_B2 
min values  :    0.0008457669,      0.02084067,      0.04259216,       0.0748399 
max values  :    1.0124315023,      0.78617686,      0.69246972,       0.7177562 

The metadata now shows that this contains 4 layers. You can also make the stack directly by passing a list of file names:

filenames <- paste0('./data/rs/LC08_044034_20170614_B', 1:11, ".tif")

landsat <- rast(filenames)

landsat
class       : SpatRaster 
dimensions  : 1245, 1497, 11  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : 594090, 639000, 4190190, 4227540  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 10N (EPSG:32610) 
sources     : LC08_044034_20170614_B1.tif  
              LC08_044034_20170614_B2.tif  
              LC08_044034_20170614_B3.tif  
              ... and 8 more sources
names       : LC08_~14_B1, LC08_~14_B2, LC08_~14_B3, LC08_~14_B4,  LC08_~14_B5,  LC08_~14_B6, ... 
min values  :  0.09641791,   0.0748399,  0.04259216,  0.02084067, 0.0008457669, -0.007872183, ... 
max values  :  0.73462820,   0.7177562,  0.69246972,  0.78617686, 1.0124315023,  1.043204546, ... 

This now contains all bands representing reflection intensity in the following wavelengths: Ultra Blue, Blue, Green, Red, Near Infrared (NIR), Shortwave Infrared (SWIR) 1, Shortwave Infrared (SWIR) 2, Panchromatic, Cirrus, Thermal Infrared (TIRS) 1, Thermal Infrared (TIRS) 2.

If we now use the extract() function with this object, it will return values for all bands in the stack:

extract(landsat, cbind(615000,4199000))
  LC08_044034_20170614_B1 LC08_044034_20170614_B2 LC08_044034_20170614_B3
1               0.1691325               0.1547111               0.1395956
  LC08_044034_20170614_B4 LC08_044034_20170614_B5 LC08_044034_20170614_B6
1               0.1370149               0.2121801               0.1594604
  LC08_044034_20170614_B7 LC08_044034_20170614_B8 LC08_044034_20170614_B9
1               0.1526508              0.09177701             0.001236123
  LC08_044034_20170614_B10 LC08_044034_20170614_B11
1                 311.6736                  308.671
par(mfrow = c(2,2))
plot(b2, main = "Blue", col = gray(0:100 / 100))
plot(b3, main = "Green", col = gray(0:100 / 100))
plot(b4, main = "Red", col = gray(0:100 / 100))
plot(b5, main = "NIR", col = gray(0:100 / 100))

The values in each layer range from 0 to 1, and the same scale has been used for each band, showing clearly the difference in reflectance for this set of wavelengths. For example, vegetation reflects more energy in NIR than other wavelengths and thus appears brighter. In contrast, water absorbs most of the energy in the NIR wavelength and it appears dark

Composite images

The bands can be combined to form composite images. Here, we’ll use the red, green and blue bands make a true color image. This uses the concatenation function (c()), and the order of the bands is important (R, then G, then B):

landsatRGB <- c(b4, b3, b2)

plotRGB(landsatRGB, stretch = "lin")

We can also make a false color composite with the NIR, red and green bands, where bright reds indicate vegetation cover:

Another popular image visualization method in remote sensing is known “false color” image in which NIR, red, and green bands are combined. This representation is popular as it makes it easy to see the vegetation (in red).

landsatFCC <- c(b5, b4, b3)
plotRGB(landsatFCC, stretch="lin")

Raster algebra

terra makes it easy to carry out simple raster algebraic operations. Eahc band or layer is treated a 2D array which makes it possible to add, subtract, multiply, divide, etc. As a simple example here, we can calculate the NDVI for the Landsat scene as

[ NDVI = (NIR - R) / (NIR + R) ]

NIR is band 5 and red is band 4:

ndvi <- (b5 - b4) / (b5 + b4)

plot(ndvi, col=rev(terrain.colors(10)), main = "NDVI")

We can also make a quick histogram of values to look for any outliers:

hist(ndvi, main = "NDVI")
Warning: [hist] a sample of 54% of the cells was used

We can then use the results to carry out some simple classification.

  • Vegetation (NDVI > 0.4). The clamp function masks all value otuside of a range (here below 0.4)
veg = clamp(ndvi, lower=0.4, values=FALSE)

plot(veg)

  • Croplands (corresponding to the peak in NDVI values):
crops = ndvi > 0.25 & ndvi < 0.3 

plot(crops)

And water bodies (NDVI < 0):

water = ndvi < 0

plot(water)

Exercises

The exercise for the module will get you to make some simple maps with new datasets. You should submit the results as an R script with the commands you used, and a Word (or equivalent) with any results or figures or as an html page generated from an R Markdown document

Exercise 1

In the zipfile countries.zip is a global shapefile with various socio-economic indicators for different countries. Load this file into R and make plots of any two of the following variables (the variable names are given in brackets). Try different color scales and transformations of the data to get the most informative maps

  • Median GDP: (gdp_md_est)
  • Population: (pop_est)
  • Income group: (income_grp)

Exercise 2

Using the NDVI values you calculated in the raster section, calculate the median NDVI for Bethel Island and Oakley. (Optional - for a slightly more challenging exercise, extract all NDVI values for these two places and compare using a \(t\)-test.)